A Telegram Corpus for Hate Speech, Offensive Language, and Online Harm

نویسندگان

چکیده

We provide a new text corpus from the social medium Telegram, which is rich in indirect forms of divisive speech. scraped all messages one channel Donald Trump supporters, covering large part his presidency, late 2016 until January 2021, including 6 Capitol riot. The discussion among group members, over this long time period, includes spread disinformation, disparaging out-group and other harmful To enable research into role speech political discourse, we added two types annotations to corpus: (i) automatic offensive language for messages, (ii) our own manual portion posts leading up 2021 riot its aftermath.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Hate Speech Detection and the Problem of Offensive Language

A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision because they classify all messages containing particular terms as hate speech and previous work using supervised learning has failed to distinguish between the two categories. We used a crowd-sourced...

متن کامل

Measuring Offensive Speech in Online Political Discourse

The Internet and online forums such as Reddit have become an increasingly popular medium for citizens to engage in political conversations. However, the online disinhibition effect resulting from the ability to use pseudonymous identities may manifest in the form of offensive speech, consequently making political discussions more aggressive and polarizing than they already are. Such environment...

متن کامل

TELEGRAM: A Grammar Formalism for Language Planning

Planning provides the basis for a theory of language generation that considers the communicative goals of the speaker when producing utterances. One central problem in designing a system based on such a theory is specifying the requisite linguistic knowledge in a form that interfaces well with a planning system and allows for the encoding of discourse information. The TELEGRAM (TELEological GRA...

متن کامل

Automatic Detection of Online Jihadist Hate Speech

We have developed a system that automatically detects online jihadist hate speech with over 80% accuracy, by using techniques from Natural Language Processing and Machine Learning. The system is trained on a corpus of 45,000 subversive Twitter messages collected from October 2014 to December 2016. We present a qualitative and quantitative analysis of the jihadist rhetoric in the corpus, examine...

متن کامل

developing a pattern based on speech acts and language functions for developing materials for the course “ the study of islamic texts translation”

هدف پژوهش حاضر ارائه ی الگویی بر اساس کنش گفتار و کارکرد زبان برای تدوین مطالب درس "بررسی آثار ترجمه شده ی اسلامی" می باشد. در الگوی جدید، جهت تدوین مطالب بهتر و جذاب تر، بر خلاف کتاب-های موجود، از مدل های سطوح گفتارِ آستین (1962)، گروه بندی عملکردهای گفتارِ سرل (1976) و کارکرد زبانیِ هالیدی (1978) بهره جسته شده است. برای این منظور، 57 آیه ی شریفه، به صورت تصادفی از بخش-های مختلف قرآن انتخاب گردید...

15 صفحه اول

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of open humanities data

سال: 2021

ISSN: ['2059-481X']

DOI: https://doi.org/10.5334/johd.32